Propositionalisation and Aggregates
نویسندگان
چکیده
The fact that data is scattered over many tables causes many problems in the practice of data mining. To deal with this problem, one either constructs a single table by hand, or one uses a Multi-Relational Data Mining algorithm. In this paper, we propose a different approach in which the single table is constructed automatically using aggregate functions, which repeatedly summarise information from different tables over associations in the datamodel. Following the construction of the single table, we apply traditional data mining algorithms. Next to an in-depth discussion of our approach, the paper presents results of experiments on three well-known data sets.
منابع مشابه
Good and Bad Practices in Propositionalisation
Data is mainly available in relational formats, so relational data mining receives a lot of interest. Propositionalisation consists in changing the representation of relational data in order to apply usual attribute-value learning systems. Data mining practitioners are not necessarily aware of existing works and try to propositionalise by hand. Unfortunately there exists some tempting pitfalls....
متن کاملReduction of ILP Search Space with Bottom-Up Propositionalisation
This paper introduces a method for algorithmic reduction of the search space of an ILP task, omitting the need for explicit language bias. It relies on bottom-up propositionalisation of examples and background knowledge. A proof of concept has been developed for observational learning of stratified normal logic programs.
متن کاملPropositionalisation of Profile Hidden Markov Models for Biological Sequence Analysis
Hidden Markov Models are a widely used generative model for analysing sequence data. A variant, Profile Hidden Markov Models are a special case used in Bioinformatics to represent, for example, protein families. In this paper we introduce a simple propositionalisation method for Profile Hidden Markov Models. The method allows the use of PHMMs discriminatively in a classification task. Previousl...
متن کاملApproaching the ILP 2005 Challenge: Class-Conditional Bayesian Propositionalization for Genetic Classification
This report presents a statistical propositionalisation approach to relational classification and probability estimation on the genetic ILP Challenge domain. The main difference between our and existing propositionalisation approaches is its ability to construct features from categorical attributes with many possible values and in particular the object identifiers. Our classification and rankin...
متن کاملLazy Propositionalisation for Relational Learning
A number of Inductive Logic Programming (ILP) systems have addressed the problem of learning First Order Logic (FOL) discriminant definitions by first reformulating the FOL learning problem into an attribute-value one and then applying efficient learning techniques dedicated to this simpler formalism. The complexity of such propositionalisation methods is now in the size of the reformulated pro...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001